0%

(CVPR 2018) Future Frame Prediction for Anomaly Detection -- A New Baseline

Liu W, Luo W, Lian D, et al. Future frame prediction for anomaly detection–a new baseline[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6536-6545.



1. Overview


1.1. Motivation

  • In anomaly video detection, almost all existing methods tackle the problem by minimizing the reconstruction errors of training data, which can not guarantee a large reconstruction error for an abnormal event
  • The capacity of DNN is high, and larger reconstruction errors for abnormal events do not necessarily happen
  • Abnormal events are unbounded

In this paper, it proposes to tackle the anomaly detection problem within a video prediction framework

  • first work to leverage the difference between predicted future frame and GT
  • first work to introduce a temporal constraint into video prediction task
  • other than spatial constraints (intensity and gradient), also introduce temporal constraint


  • Hand-craft Feature. HOG, HOF
  • Deep Learning. ConvLSTM-AE
  • Video Frame Prediction
  • Least Square GAN



2. Architecture


2.1. Overview




2.2. Contraints

  • Intensity



  • Gradient



  • Motion



    f. pre-trained FlowNet
  • Adversarial (Least Square GAN)


2.3. Loss Function



  • frame normalize to [-1, 1]
  • frame resize to 256x256
  • t=4; random clip of 5 sequential frames
  • batch size 4
  • int, gd, op, adv: 1.0, 1.0, 2.0, 0.05

2.4. Anomaly Detection on Testing Data

  • Mathieu shows that Peak Signal to Noise Ratio (PSNR) is a better way for image quality assessment (higher PSNR → normal)



  • normalize PSNR of all frames in each testing video to the range [0, 1], and calculate the regular score for each frame by





3. Experiments


3.1. Dataset

3.1.1. CUHK Avenue Dataset

  • 16 training videos and 21 testing videos
  • 47 abnormal events

3.1.2. UCSD Dataset

  • UCSD Pedestrian 1. 34 training videos and 36 testing videos with 40 irregular events
  • UCSD Pedestrian 2. 16 training videos and 12 testing videos with 12 abnormal events

3.1.3. ShanghaiTech Dataset

  • 330 training videos and 107 testing
  • 130 abnormal events

3.1.4. Toy Dataset

  • 210 frames for training
  • 1242 frames for testing

3.2. Comparison



3.3. Ablation Study



  • Delta. gap between average score of normal frames and that of abnormal frames




3.4. Toy Experiments



  • After observing the pedestrian for a while when the pedestrian has made his or her choice, it becomes predictable and PSNR would go up